8 research outputs found
Energy Sharing for Multiple Sensor Nodes with Finite Buffers
We consider the problem of finding optimal energy sharing policies that
maximize the network performance of a system comprising of multiple sensor
nodes and a single energy harvesting (EH) source. Sensor nodes periodically
sense the random field and generate data, which is stored in the corresponding
data queues. The EH source harnesses energy from ambient energy sources and the
generated energy is stored in an energy buffer. Sensor nodes receive energy for
data transmission from the EH source. The EH source has to efficiently share
the stored energy among the nodes in order to minimize the long-run average
delay in data transmission. We formulate the problem of energy sharing between
the nodes in the framework of average cost infinite-horizon Markov decision
processes (MDPs). We develop efficient energy sharing algorithms, namely
Q-learning algorithm with exploration mechanisms based on the -greedy
method as well as upper confidence bound (UCB). We extend these algorithms by
incorporating state and action space aggregation to tackle state-action space
explosion in the MDP. We also develop a cross entropy based method that
incorporates policy parameterization in order to find near optimal energy
sharing policies. Through simulations, we show that our algorithms yield energy
sharing policies that outperform the heuristic greedy method.Comment: 38 pages, 10 figure
A simulation-based algorithm for optimal pricing policy under demand uncertainty
We propose a simulation-based algorithm for computing the optimal pricing policy for a product under uncertain demand dynamics. We consider a parameterized stochastic differential equation (SDE) model for the uncertain demand dynamics of the product over the planning horizon. In particular, we consider a dynamic model that is an extension of the Bass model. The performance of our algorithm is compared to that of a myopic pricing policy and is shown to give better results. Two significant advantages with our algorithm are as follows: (a) it does not require information on the system model parameters if the SDE system state is known via either a simulation device or real data, and (b) as it works efficiently even for high-dimensional parameters, it uses the efficient smoothed functional gradient estimator
Shaping Proto-Value Functions Using Rewards
In reinforcement learning (RL), an important sub-problem is learning the value function, which is chiefly influenced by the architecture used to represent value functions. is often expressed as a linear combination of a pre-selected set of basis functions. These basis functions are either selected in an ad-hoc manner or are tailored to the RL task using the domain knowledge. Selecting basis functions in an ad-hoc manner does not give a good approximation of value function while choosing functions using domain knowledge introduces dependency on the task. Thus, a desirable scenario is to have a method to choose basis functions that are task independent, but which also provide a good approximation for the value function. In this paper, we propose a novel task-independent basis function construction method that uses the topology of the underlying state space and the reward structure to build the reward-based Proto Value Functions (RPVFs). The approach we propose gives good approximation for the value function and enhanced learning performance. The performance is demonstrated via experiments on grid-world tasks